We are going to perform a quality control (QC) analysis of the mapping results obtained by running cellranger version 7.0.0.
We will pull together all the libraries from all the CSF subprojects.
## [1] "CSF_01"
## [1] "4608"
## [1] "4839"
## [1] "5700"
## [1] "5792"
## [1] "5929"
## [1] "CSF_02"
## [1] "7921"
## [1] "7974"
## [1] "CSF_03"
## [1] "3054"
## [1] "3087"
## [1] "3887"
## [1] "8102"
We will start by showing the three most relevant metrics (number of reads, estimated number of recovered cells, fraction of reads in cells, mean reads per cell, fraction of reads mapped to exonic reads, and median genes per cell) obtained by cellranger for each of the working libraries. This information will give us an idea of the quality of the experiment as well as the sequencing and the mapping steps.
| GEX QC metrics | |||||||
| cellranger v 7.0.0 | |||||||
| Subproject | GemID | Cells | Median UMI counts per cell | Median genes per cell | Median reads per cell | Total genes detected | Number of reads |
|---|---|---|---|---|---|---|---|
| CSF_01 | 4608 | 3984 | 1623 | 858 | 19472 | 22942 | 266.97M |
| CSF_01 | 4839 | 3504 | 2964 | 1303 | 41738 | 22837 | 306.87M |
| CSF_01 | 5700 | 8989 | 2310 | 1049 | 16548 | 23935 | 298.90M |
| CSF_01 | 5792 | 7602 | 3659 | 1517 | 22742 | 25079 | 303.81M |
| CSF_01 | 5929 | 6962 | 3095 | 1319 | 16182 | 24309 | 261.68M |
| CSF_02 | 7921 | 4813 | 3998 | 1615 | 30334 | 23910 | 315.94M |
| CSF_02 | 7974 | 10498 | 4237 | 1825 | 31747 | 26802 | 610.99M |
| CSF_03 | 3054 | 1264 | 2696 | 1246 | 65543 | 22785 | 297.91M |
| CSF_03 | 3087 | 4294 | 5740 | 1968 | 43336 | 25201 | 305.29M |
| CSF_03 | 3887 | 2480 | 3094 | 1292 | 28178 | 26006 | 257.35M |
| CSF_03 | 8102 | 9637 | 2641 | 1030 | 19239 | 24883 | 360.44M |
Next, we will check the quality of the mapping step performed by
cellranger 7.0.0 across libraries. To do so, we will
compare the percentage of reads mapped to the genome, and within these
mapped reads, the amount of reads mapped to intergenic regions, intronic
and exonic regions. We aim to obtain libraries with a high percentage of
confidently mapped reads, and specially a high percentage of exonic
reads, which correspond with gene expression or RNAs. The reads mapping
to intergenic regions suggest contamination of ambient DNA, whereas
reads mapping to intronic regions may come from pre-mRNAs or mature
spliced isoforms that retain certain introns.
## [1] "Confidently_mapped_to_genome"
## [1] "Confidently_mapped_to_intergenic_regions"
## [1] "Confidently_mapped_to_intronic_regions"
## [1] "Confidently_mapped_to_exonic_regions"
## [1] "Confidently_mapped_antisense"
## [1] "Confidently_mapped_to_transcriptome"
After assessing mapped reads, it is important to test which is the sequencing saturation and depth for each library. The sequencing saturation is dependent on the library complexity and sequencing depth. The library complexity is the total number of different transcripts present in the library and it varies between the cell types/tissues, whereas the sequencing depth is the number of paired reads per cell. For this reason, we will plot the number of detected genes as a function of depth (sequenced reads). As sequencing depth increases, more genes are detected, but this function reaches a plateau, whereby more sequenced reads does not result in more detected genes; therefore, at this point we assure we sequenced until saturation. More specifically, the sequencing saturation the fraction of confidently mapped, valid cell-barcode, valid UMI reads that had a non-unique (cell-barcode, UMI, gene).
We will start by showing the three most relevant metrics (number of reads, estimated number of recovered cells, fraction of reads in cells, mean reads per cell, fraction of reads mapped to any V(D)J gene, and cells with productive V-J Spanning Pair) obtained by cellranger for each of the working libraries. This information will give us an idea of the quality of the experiment as well as the sequencing and the mapping steps.
| VDJ-T QC metrics | |||||||
| cellranger v 7.0.0 | |||||||
| Subproject | GemID | Number of reads | Estimated number of cells | Fraction reads in cells | Mean reads per cell | Reads mapped to any V D J gene | Cells with productive V-J spanning pair |
|---|---|---|---|---|---|---|---|
| CSF_01 | 4608 | 13071011 | 815 | 34.80 | 0.02M | 38.45 | 58.65 |
| CSF_01 | 4839 | 15751069 | 1912 | 77.62 | 0.01M | 72.19 | 77.62 |
| CSF_01 | 5700 | 14127176 | 5719 | 62.66 | 0.00M | 51.65 | 64.84 |
| CSF_01 | 5792 | 18090339 | 2729 | 43.33 | 0.01M | 31.17 | 69.11 |
| CSF_01 | 5929 | 14573775 | 4297 | 52.51 | 0.00M | 82.88 | 80.78 |
| CSF_02 | 7921 | 19365195 | 3254 | 84.28 | 0.01M | 77.96 | 82.27 |
| CSF_02 | 7974 | 23961188 | 5087 | 83.40 | 0.00M | 83.33 | 81.80 |
Next, we will check the quality of the V(D)J mapping step performed
by cellranger 7.0.0 across libraries. To do so, we will
compare the percentage of reads mapped to any germline V(D)J gene
segment, and within these mapped reads, the amount of reads mapped TRA
and TRB germline gene segment.
Here, we will assess the median number of UMIs assigned to a TRA/TRB contig per cell. Low values for any of the two parameters can indicate cells with extremely low TRA/TRB expression or poor cell quality, among others.
| VDJ-T expression | ||
| cellranger v 7.0.0 | ||
| GemID | Median_TRA_UMIs_per_Cell | Median_TRB_UMIs_per_Cell |
|---|---|---|
| 4608 | 2 | 4 |
| 4839 | 3 | 7 |
| 5700 | 3 | 8 |
| 5792 | 3 | 7 |
| 5929 | 5 | 10 |
| 7921 | 4 | 10 |
| 7974 | 5 | 10 |
Now, we will check the V(D)J annotation for the studied samples. To better interpret the obtained results, we will consider the information given in the cellranger web summary file. We will assess the fraction of cell-associated barcodes (with at least…), that are the following ones:
Cells With TRA/TRB Contig: one TRA/TRB contig annotated as a full or partial V(D)J gene.
Cells With CDR3-annotated TRA/TRB Contig: one TRA/TRB contig where a CDR3 was detected.
Cells With Productive TRA/TRB Contig: one contig that spans the 5’ end of the V region to the 3’ end of the J region for TRA/TRB, has a start codon in the expected part of the V sequence, has an in-frame CDR3, and has no stop codons in the aligned V-J region.
Cells With Productive V-J Spanning Pair: one productive contig for each chain of the receptor pair. As well as the correspondent the number of cells with productive V-J Spanning Pair.
For all thre previous parameters, low values can indicate poor cell quality, low yield from the RT reaction, poor specificity of the V(D)J enrichment. Moreover, we will also check:
| V(D)J annotation | ||||||
| cellranger v6.0.1 | ||||||
| GEM ID | Estimated Number of Recovered Cells | Productive V-J Spanning Pair | Paired Clonotype Diversity | Productive contig | ||
|---|---|---|---|---|---|---|
| Fraction | Cells | TRA | TRB | |||
| 4608 | 815 | 58.65 | 478 | 109.05 | 65.28 | 93.37 |
| 4839 | 1912 | 77.62 | 1484 | 457.31 | 80.65 | 96.97 |
| 5700 | 5719 | 64.84 | 3708 | 3103.42 | 68.86 | 95.98 |
| 5792 | 2729 | 69.11 | 1886 | 998.72 | 72.63 | 96.48 |
| 5929 | 4297 | 80.78 | 3471 | 648.62 | 83.17 | 97.60 |
| 7921 | 3254 | 82.27 | 2677 | 1905.78 | 84.97 | 97.30 |
| 7974 | 5087 | 81.80 | 4161 | 748.75 | 84.18 | 97.62 |
We will start by showing the three most relevant metrics (number of reads, estimated number of recovered cells, fraction of reads in cells, mean reads per cell, fraction of reads mapped to any V(D)J gene, and cells with productive V-J Spanning Pair) obtained by cellranger for each of the working libraries. This information will give us an idea of the quality of the experiment as well as the sequencing and the mapping steps.
| BCR-V(D)J QC metrics | ||||||
| cellranger v6.0.1 | ||||||
| GEM ID | Number of Reads | Estimated Number of Recovered Cells | Fraction of Reads in Cells | Mean Reads per Cell | Fraction of Reads Mapped to any VDJ gene | Cells With Productive V-J Spanning Pair |
|---|---|---|---|---|---|---|
| 4608 | 280.04M | 2 | 0.0% | 140021911 | 2.2% | 0 |
| 4839 | 13.55M | 13 | 48.3% | 1042457 | 47.6% | 13 |
| 5929 | 13.87M | 414 | 92.0% | 33499 | 90.5% | 387 |
| 7921 | 18.35M | 134 | 16.9% | 136953 | 16.8% | 125 |
| 7974 | 19.85M | 1852 | 88.1% | 10717 | 88.5% | 1311 |
Next, we will check the quality of the V(D)J mapping step performed
by cellranger 7.0.0 across libraries. To do so, we will
compare the percentage of reads mapped to any germline V(D)J gene
segment, and within these mapped reads, the amount of reads mapped IGH,
IGK and IGL germline gene segment.
Here, we will assess the median number of UMIs assigned to a IGH/IGK/IGL contig per cell. Low values for any of the three parameters can indicate cells with extremely low IGH/IGK/IGL expression or poor cell quality, among others.
| V(D)J expression | |||
| cellranger v6.0.1 | |||
| GEM ID | Median IGH UMIs per Cell | Median IGK UMIs per Cell | Median IGL UMIs per Cell |
|---|---|---|---|
| 4608 | 6.0 | 21 | NA |
| 4839 | 1995.0 | 7406 | 85 |
| 5929 | 2185.5 | 4343 | 5830 |
| 7921 | 10.5 | 26 | 30 |
| 7974 | 12.0 | 29 | 27 |
Now, we will check the V(D)J annotation for the studied samples. To better interpret the obtained results, we will consider the information given in the cellranger web summary file. We will assess the fraction of cell-associated barcodes (with at least…), that are the following ones:
Cells With IGH/IGK/IGL Contig: one IGH/IGK/IGL contig annotated as a full or partial V(D)J gene.
Cells With CDR3-annotated IGH/IGK/IGL Contig: one IGH/IGK/IGL contig where a CDR3 was detected.
Cells With Productive IGH/IGK/IGL Contig: one contig that spans the 5’ end of the V region to the 3’ end of the J region for IGH/IGK/IGL, has a start codon in the expected part of the V sequence, has an in-frame CDR3, and has no stop codons in the aligned V-J region.
Cells With Productive V-J Spanning Pair: one productive contig for each chain of the receptor pair. As well as the correspondent the number of cells with productive V-J Spanning Pair.
For all thre previous parameters, low values can indicate poor cell quality, low yield from the RT reaction, poor specificity of the V(D)J enrichment. Moreover, we will also check:
| V(D)J annotation | ||||||||
| cellranger v6.0.1 | ||||||||
| GEM ID | Productive contig | Estimated Number of Recovered Cells | Productive V-J Spanning Pair | Paired Clonotype Diversity | ||||
|---|---|---|---|---|---|---|---|---|
| IGH | IGK | IGL | Fraction IGK IGH Pair | Fraction IGL IGH Pair | Cells | |||
| 4608 | 0.00 | 100.00 | 0.00 | 2 | 0.00 | 0.00 | 0 | 2.00 |
| 4839 | 100.00 | 61.54 | 38.46 | 13 | 61.54 | 38.46 | 13 | 13.00 |
| 5929 | 93.48 | 67.39 | 33.09 | 414 | 63.04 | 30.92 | 387 | 50.95 |
| 7921 | 93.28 | 57.46 | 42.54 | 134 | 52.99 | 40.30 | 125 | 128.26 |
| 7974 | 71.71 | 61.02 | 38.50 | 1852 | 44.28 | 26.94 | 1311 | 497.09 |
## [1] "Libraries metadata"
## project subproject gem_id library_id library_name library_barcode hashing
## 1 CSF CSF_01 4608 276966 4608_GEX AZ8142 not_hashed
## 2 CSF CSF_01 4839 276967 4839_GEX AZ8143 not_hashed
## 3 CSF CSF_01 5929 276968 5929_GEX AZ8144 not_hashed
## 4 CSF CSF_01 5700 276969 5700_GEX AZ8145 not_hashed
## 5 CSF CSF_01 5792 276970 5792_GEX AZ8146 not_hashed
## 6 CSF CSF_01 4608 277405 4608_TCR AZ8390 not_hashed
## 7 CSF CSF_01 4839 277406 4839_TCR AZ8391 not_hashed
## 8 CSF CSF_01 5929 277407 5929_TCR AZ8392 not_hashed
## 9 CSF CSF_01 5700 277408 5700_TCR AZ8393 not_hashed
## 10 CSF CSF_01 5792 277409 5792_TCR AZ8394 not_hashed
## 11 CSF CSF_01 4839 277410 4839_BCR AZ8396 not_hashed
## 12 CSF CSF_01 5929 277411 5929_BCR AZ8397 not_hashed
## 13 CSF CSF_02 7921 277387 7921_GEX AZ7864 not_hashed
## 14 CSF CSF_02 7974 277388 7974_GEX AZ7865 not_hashed
## 15 CSF CSF_02 7921 277401 7921_TCR AZ8125 not_hashed
## 16 CSF CSF_02 7974 277402 7974_TCR AZ8126 not_hashed
## 17 CSF CSF_02 7921 277403 7921_BCR AZ8127 not_hashed
## 18 CSF CSF_02 7974 277404 7974_BCR AZ8128 not_hashed
## 19 CSF CSF_03 3087 277820 3087_GEX AZ8561 not_hashed
## 20 CSF CSF_03 3887 277821 3887_GEX AZ8562 not_hashed
## 21 CSF CSF_03 8102 277822 8102_GEX AZ8563 not_hashed
## 22 CSF CSF_03 3054 277823 3054_GEX AZ8564 not_hashed
## type donor_id wet_lab
## 1 cDNA 4608_GEX 4608_GEX
## 2 cDNA 4839_GEX 4839_GEX
## 3 cDNA 5929_GEX 5929_GEX
## 4 cDNA 5700_GEX 5700_GEX
## 5 cDNA 5792_GEX 5792_GEX
## 6 VDJ-T 4608_TCR 4608_TCR
## 7 VDJ-T 4839_TCR 4839_TCR
## 8 VDJ-T 5929_TCR 5929_TCR
## 9 VDJ-T 5700_TCR 5700_TCR
## 10 VDJ-T 5792_TCR 5792_TCR
## 11 VDJ-B 4839_BCR 4839_BCR
## 12 VDJ-B 5929_BCR 5929_BCR
## 13 cDNA 7921_GEX 7921_GEX
## 14 cDNA 7974_GEX 7974_GEX
## 15 VDJ-T 7921_TCR 7921_TCR
## 16 VDJ-T 7974_TCR 7974_TCR
## 17 VDJ-B 7921_BCR 7921_BCR
## 18 VDJ-B 7974_BCR 7974_BCR
## 19 cDNA 3087_GEX 3087_GEX
## 20 cDNA 3887_GEX 3887_GEX
## 21 cDNA 8102_GEX 8102_GEX
## 22 cDNA 3054_GEX 3054_GEX
## [1] "GEX QC summary table"
## # A tibble: 11 x 26
## Subproj~1 GemID Cells Confi~2 Media~3 Media~4 Media~5 Total~6 Numbe~7 Numbe~8
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CSF_01 4608 3984 85.8 1623 858 19472 22942 2.67e8 0
## 2 CSF_01 4839 3504 93.3 2964 1303 41738 22837 3.07e8 0
## 3 CSF_01 5700 8989 93.0 2310 1049 16548 23935 2.99e8 0
## 4 CSF_01 5792 7602 91.8 3659 1517 22742 25079 3.04e8 0
## 5 CSF_01 5929 6962 79.4 3095 1319 16182 24309 2.62e8 0
## 6 CSF_02 7921 4813 93.8 3998 1615 30334 23910 3.16e8 0
## 7 CSF_02 7974 10498 89.9 4237 1825 31747 26802 6.11e8 0
## 8 CSF_03 3054 1264 86.2 2696 1246 65543 22785 2.98e8 0
## 9 CSF_03 3087 4294 90.2 5740 1968 43336 25201 3.05e8 0
## 10 CSF_03 3887 2480 84.8 3094 1292 28178 26006 2.57e8 0
## 11 CSF_03 8102 9637 91.0 2641 1030 19239 24883 3.60e8 0
## # ... with 16 more variables: Q30_RNA_read <dbl>, Q30_UMI <dbl>,
## # Q30_barcodes <dbl>, Confidently_mapped_antisense <dbl>,
## # Confidently_mapped_to_exonic_regions <dbl>,
## # Confidently_mapped_to_genome <dbl>,
## # Confidently_mapped_to_intergenic_regions <dbl>,
## # Confidently_mapped_to_intronic_regions <dbl>,
## # Confidently_mapped_to_transcriptome <dbl>, ...
## [1] "VDJ-T QC summary table"
## # A tibble: 7 x 23
## Subpro~1 GemID Cells~2 Cells~3 Cells~4 Cells~5 Estim~6 Media~7 Media~8 Numbe~9
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CSF_01 4608 65.3 93.4 58.6 58.6 815 2 4 478
## 2 CSF_01 4839 80.6 97.0 77.6 77.6 1912 3 7 1484
## 3 CSF_01 5700 68.9 96.0 64.8 64.8 5719 3 8 3708
## 4 CSF_01 5792 72.6 96.5 69.1 69.1 2729 3 7 1886
## 5 CSF_01 5929 83.2 97.6 80.8 80.8 4297 5 10 3471
## 6 CSF_02 7921 85.0 97.3 82.3 82.3 3254 4 10 2677
## 7 CSF_02 7974 84.2 97.6 81.8 81.8 5087 5 10 4161
## # ... with 13 more variables: Paired_clonotype_diversity <dbl>,
## # Number_of_reads <dbl>, Number_of_short_reads_skipped <dbl>,
## # Q30_RNA_read <dbl>, Q30_UMI <dbl>, Q30_barcodes <dbl>,
## # Fraction_reads_in_cells <dbl>, Mean_reads_per_cell <dbl>,
## # Mean_used_reads_per_cell <dbl>, Reads_mapped_to_TRA <dbl>,
## # Reads_mapped_to_TRB <dbl>, Reads_mapped_to_any_V_D_J_gene <dbl>,
## # Valid_barcodes <dbl>, and abbreviated variable names 1: Subproject, ...
## [1] "VDJ-B QC summary table"
## # A tibble: 5 x 27
## Subpro~1 GemID Cells~2 Cells~3 Cells~4 Cells~5 Cells~6 Cells~7 Estim~8 Media~9
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CSF_01 4608 0 100 0 0 0 0 2 6
## 2 CSF_01 4839 100 61.5 38.5 61.5 38.5 100 13 1995
## 3 CSF_01 5929 93.5 67.4 33.1 63.0 30.9 93.5 414 2186.
## 4 CSF_02 7921 93.3 57.5 42.5 53.0 40.3 93.3 134 10.5
## 5 CSF_02 7974 71.7 61.0 38.5 44.3 26.9 70.8 1852 12
## # ... with 17 more variables: Median_IGK_UMIs_per_Cell <dbl>,
## # Number_of_cells_with_productive_V_J_spanning_pair <dbl>,
## # Paired_clonotype_diversity <dbl>, Number_of_reads <dbl>,
## # Number_of_short_reads_skipped <dbl>, Q30_RNA_read <dbl>, Q30_UMI <dbl>,
## # Q30_barcodes <dbl>, Fraction_reads_in_cells <dbl>,
## # Mean_reads_per_cell <dbl>, Mean_used_reads_per_cell <dbl>,
## # Reads_mapped_to_IGH <dbl>, Reads_mapped_to_IGK <dbl>, ...